Skip to content

fix(eval): preserve custom eval metadata#5922

Open
he-yufeng wants to merge 1 commit into
google:mainfrom
he-yufeng:fix/eval-extra-metadata-fresh
Open

fix(eval): preserve custom eval metadata#5922
he-yufeng wants to merge 1 commit into
google:mainfrom
he-yufeng:fix/eval-extra-metadata-fresh

Conversation

@he-yufeng
Copy link
Copy Markdown
Contributor

Summary

  • allow evaluation models to preserve caller-provided metadata fields
  • add a regression test covering extra fields on SessionInput and EvalCase

Fixes #5906

To verify

  • PYTHONPATH=src python -m pytest tests/unittests/evaluation/test_eval_case.py -q
  • python -m pyink --check src/google/adk/evaluation/common.py tests/unittests/evaluation/test_eval_case.py
  • python -m ruff check src/google/adk/evaluation/common.py tests/unittests/evaluation/test_eval_case.py
  • git diff --check

@adk-bot adk-bot added the eval [Component] This issue is related to evaluation label May 31, 2026
@rohityan rohityan self-assigned this Jun 1, 2026
@he-yufeng he-yufeng force-pushed the fix/eval-extra-metadata-fresh branch 2 times, most recently from 9516053 to 1d17dd7 Compare June 2, 2026 19:14
@he-yufeng
Copy link
Copy Markdown
Contributor Author

Rebased this branch onto the latest main.

Validation on the rebased branch:

  • PYTHONPATH=src .venv\Scripts\python.exe -m pytest tests\unittests\evaluation\test_eval_case.py -q
  • PYTHONPATH=src .venv\Scripts\python.exe -m py_compile src\google\adk\evaluation\common.py tests\unittests\evaluation\test_eval_case.py
  • git diff --check upstream/main...HEAD

Note: using the repo-local .venv is necessary on my Windows machine; the global conda environment has an older/incompatible google-genai install.

@adk-bot
Copy link
Copy Markdown
Collaborator

adk-bot commented Jun 5, 2026

I will begin by reading the instructions for the adk-pr-analyze skill to ensure I follow the correct procedure for analyzing the pull request.I will run the Contributor License Agreement (CLA) verification script to ensure the contributor has signed the Google CLA before proceeding with any analysis of PR #5922.I will fetch the details of GitHub Pull Request #5922 using the custom fetch_github_pr tool.I will view the triage_pr.py helper script to see how it performs verification and if we can perform a equivalent, permissible check or find another way to handle CLA verification.I will fetch the status check information for PR #5922 to verify the CLA signature status of the contributor using the gh command.I will fetch the details of the linked issue #5906 using the custom fetch_github_issue tool.I will retrieve the complete line-by-line diff of PR #5922 using the gh tool.I will view the src/google/adk/evaluation/common.py file to inspect its content in our baseline workspace.I will view the unit test file tests/unittests/evaluation/test_eval_case.py in our workspace to inspect how tests are designed around evaluation cases.I will list the contents of the src/google/adk/evaluation folder to see what other files are there and how the evaluation module is structured.I will search for extra="forbid" occurrences in the codebase to see if other modules are similarly configured and whether this strict configuration is localized.I will search for the definition of SessionInput and EvalBaseModel in the codebase to find their exact locations and line numbers, enabling me to include direct links in the report.I will view the SessionInput class in src/google/adk/evaluation/eval_case.py to check its structure and verify its inheritance.I will write the complete and in-depth PR Analysis Report of PR #5922 to the artifact directory. This report details the change objective, the impact and execution flow, its structural and architectural value, and evaluates the implementation against Google ADK's principles.I have successfully completed the PR Analysis for PR #5922.

The detailed PR Analysis Report is now available in your workspace artifacts.

Below is an executive summary of my findings:

🔍 ADK Pull Request Analysis: PR #5922

  • Title: fix(eval): preserve custom eval metadata
  • Author: @he-yufeng
  • Status: open
  • Impact: 29 additions, 1 deletions across 2 files

Executive Summary

  1. Core Objective: Changes Pydantic's EvalBaseModel configuration in common.py from extra="forbid" to extra="allow". This ensures all children evaluation structures (like SessionInput and EvalCase) preserve custom metadata parameters instead of breaking or throwing validation errors.
  2. Justification & Value: [Justified Fix] — Essential usability enhancement that removes a significant developer bottleneck by allowing custom evaluation metrics correlation tags (e.g. tier, category) to pass cleanly through evaluation runs.
  3. Alignment with Principles: [Pass] — Code styling, module structures, and tests conform perfectly to ADK standards.
  4. Recommendation: [Approve with Nits] — Highly polished contribution. Recommending merging immediately, following a minor nit to append -> None onto the return type annotation in the newly added test suite function.

Key Open Questions or Decisions

  • Review/Merging: I highly recommend approving and merging this change. Do you want me to write a comment on the PR on GitHub containing this feedback, or would you like to review the generated [PR Analysis Report](file:///home/runner/.gemini/antigravity/brain/034a60480025faad4acccfb4432831c4/

@rohityan rohityan added the request clarification [Status] The maintainer need clarification or more information from the author label Jun 5, 2026
@rohityan
Copy link
Copy Markdown
Collaborator

rohityan commented Jun 5, 2026

Hi @he-yufeng , Thank you for your contribution! We appreciate you taking the time to submit this pull request.

@rohityan rohityan removed the request clarification [Status] The maintainer need clarification or more information from the author label Jun 5, 2026
@he-yufeng he-yufeng force-pushed the fix/eval-extra-metadata-fresh branch from 0d31501 to 4c60c7a Compare June 6, 2026 18:40
@he-yufeng
Copy link
Copy Markdown
Contributor Author

Rebased this branch onto current main; the PR diff is back to the intended two files.

Validated locally after the rebase:

  • python -m py_compile src/google/adk/evaluation/common.py tests/unittests/evaluation/test_eval_case.py
  • PYTHONPATH=src C:\dev\GITHUB-clean\adk-python\.venv\Scripts\python.exe -m pytest -q tests/unittests/evaluation/test_eval_case.py -k metadata --basetemp .tmp\pytest-5922 -p no:cacheprovider (1 passed)
  • git diff --check upstream/main..HEAD

I could not run local pyink from the reused ADK venv because that environment does not include pyink; CI will rerun the full pre-commit suite.

@adk-bot
Copy link
Copy Markdown
Collaborator

adk-bot commented Jun 6, 2026

I have successfully analyzed Google ADK Pull Request PR #5922.

  • Google CLA Validation: Verified successfully (Status: Signed & Active).
  • Scope & Correctness: Fully validated against the baseline workspace and the linked issue Issue #5906.

The detailed PR Analysis Report is now available in your workspace artifacts.

Summary of Work Done

  1. CLA Verification: Verified the contributor's CLA validation success status.
  2. Context Retrieval: Fetched PR payload and linked Issue #5906 details.
  3. Workspace inspection: Searched and reviewed the baseline configurations of EvalBaseModel in common.py, subclasses under eval_case.py, and the regression tests suite test_eval_case.py.
  4. Report Authoring: Penned down a thorough aesthetic and architectural assessment report detailing design alternatives and standard compliance.

Key Recommendation

  • Decision: Approve & Merge
  • Since the contribution is extremely elegant and uses the standard Pydantic configuration (extra="allow") to gracefully preserve user-supplied tags on evaluation base structures, we recommend merging this PR. Only a minor, non-blocking suggestion is highlighted in the report regarding test function formatting conventions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval [Component] This issue is related to evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EvalBaseModel extra='forbid' rejects custom fields in session_input and eval cases

3 participants